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SOLUTION OF NONLINEAR SYSTEMS 
L. R. Turner 

Lewis Research Center, National Aeronautics and Space 
Administration, Cleveland , Ohio 

Introduction 

The problem of solving systems of nonlinear equations has been rela- 
tively neglected in the mathematical literature, especially in the textbooks, 
in comparison to the corresponding linear problem. Moreover, treatments 
that have an appearance of generality 1,2 fail to discuss the nature of the 
solutions and the possible pitfalls of the methods suggested. 

Probably it is unrealistic to expect that a unified and comprehensive 
treatment of the subject will evolve, owing to the great variety of situa- 
tions possible, especially in the applied field where some requirement of 
human or mechanical efficiency is always present. Therefore we attempt 
here simply to pose the problem and to describe and partially appraise the 
methods of solution currently in favor. 

Statement of the Problem 

A problem in nonlinear algebra in several variables may arise in several 
ways. Fairly commonly, a physically oriented problem will contain as 
many well-established, physically independent conditions as there are un- 
known quantities in the solution. That this parity between the number 
of independent statements and unknowns is not required, however, may 
be demonstrated by a simple exercise. 

For any linear system of the form 

Fi = bi — aijXj = 0 
3 

there is a corresponding nonlinear problem of locating the zeros, if any, 
or the minima of the test function 

0i ~ y 1 F iCi t m F m 

l,m 

where Ci, m is an element of any positive definite matrix. The single func- 
tion 0i has the zeros of the system F, = 0, and its minima otherwise repre- 
sent a best solution of the system in some least-squares sense. The prop- 
erties of the original system are closely linked with the properties of the 
Jacobian (or Hessian) matrix 



If | J | 9^ 0, there will be a unique minimum point of 0i that will be a solu- 
tion if the initial system is consistent. If | / | = 0, there will be a con- 
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tinuous locus of minima (or perhaps of zeros) that, depending on definition, 
may be treated as infinitely many solutions or as no solution. 

Problems that consist of a single statement are common in physics; one 
example is the location of the minimum or other level points of a poten- 
tial function. With several statements, it is frequently expedient to con- 
struct non-negative test functions such as <j> x and to replace the original 
problem by that of finding the level points, usually minima, of <j> i . 

In the nonlinear as in the linear case, the Jacobian matrix J of a poten- 
tial function or of quadratic test function generally characterizes the 
nature of the solution; in the nonlinear case, however, there are additional 
problems. While a nonlinear system may have a unique minimum, it is 
more likely that it will have several minima or level points. Furthermore, 
these level points may be distinct or finitely confluent or they may occur 
in continuous sets* and any combination of these situations may occur in 
a single problem. If the level points are all distinct, then | J | ^ 0 in 
some open region containing each level point. At any of the points of 
confluence, | J | = 0. If there are two or more nearly coincident level 
points, | J | will be very small and may be computationally zero. 

Milne 3 has described a process by which the difficulty of coincident or 
nearly coincident roots may be resolved. An attempt is made to solve 
jointly the initially given system and the added equation | J | = 0. If 
the augmented system has a subsystem with a nonvanishing Jacobian 
determinant at the space point in question, the point is a multiple root or 
an approximation to a set of nearly coincident roots of the original system. 
It is easily demonstrated, however, by consideration of the well-determined 
problem </> = x + y A = 0, that a single, use of this process may be inade- 
quate to resolve the difficulty. 

A very practical difficulty arises in the course of locating the minima of 
a potential or test function: if the process of solution is one designed to 
seek a level point of <t > i , the computation may tend toward a maximum or 
a saddle point, as well as toward the true minimum. Booth 1 points out 
that these various types of level points may be distinguished by examining 
the quadratic form 

d 2 (j> i 

2 ^ - - - 'Xrx m 

l,m dXiOX m 

which is positive definite only at a minimum. This test will fail if the 
level points being tested are not simple; moreover, no formal test evaluated 
at a single point will distinguish between a relative and an absolute mini- 
mum. 

Once a root or extremal at the vector location Y has been determined, 
a further solution may be found by consideration of the successor problem : 
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where. || A — Y |j is a suitable norm. The solution Y is discriminated 
against weakly if p = 1 and more strongly if p > 1. A desirable choice 
of p would seem to be the square distance X ( Xi - y t f if the test function 

l 

has quadratic minima. If the problem is one of locating minima rather 
than zeros, the original problem must be returned to once an approxima- 
tion to a second level point is found. 

It has been noted that this method of discrimination involves the the- 
oretical difficulty that, when the locus ot a zero is obtained numericallv 
and therefore approximately, the successor problem has a pole and a nearly 
coincident zero (Morton A. Hyman, personal communication). This will 
become a real difficulty if the initial problem has nearly coincident zeros. 
In situations of this kind, Milne recommends the use of extended precision 
arithmetic. 

Finally, we must remember that since it is impossible to locate an in- 
finite number of roots, limits on the number of roots or on their range of 
locus are always required. 

Methods of Solution 

With so great a variety of situations it seems unlikely that a general 
method could be used to solve all problems, even if the? number of variables 
were limited. Experience gained through observing the solution of many 
problems indicates that use of a mixture of techniques greatly increases 
the probability of success. 

The methods usually recommended are of two types. The first involves 
use of functional iteration in either an extension of Newton’s method to 
several variables or, in some special cases, a method of successive substitu- 
tion. The second is a method of “descents” by which the problem of 
solving a system of nonlinear equations is reduced to that of solving an 
infinite sequence of nonlinear equations in one variable. 

Functional Iteration 

Methods of functional iteration include any method in which the initial 
system of equations = 0 is written in the vector form X = G(X) and 
solved by the iteration X* +1 = G(Xt), the subscripts indicating successive 
iterations. In many applied problems, the separate equations will repre- 
sent physically, independent statements about the solution, and in such 
cases it is usually possible to obtain a solution by writing X* + i — X* — g F 
where 



This is the extension of Newton’s method to systems of equations; usually 
it will be written in the form 

(~) (X, + 1 - X*) = -/<\(X,) 
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A problem of minimizing a potential function or a test function can be 
written in a form appropriate for Newton's method as follows: 



Householder 2 has shown that this process converges if the Jacobian matrix 
is not singular and if the initial estimate X Q is sufficiently close to X. 

The condition that the Jacobian determinant not vanish will be violated 
at multiple roots. To resolve this difficulty, Milne 3 discards temporarily 
one of the original equations, F il = 0, and in place of the original system 
solves the system 

Fi = 0 (i ^ ii) 

| J \ = 0 

If the revised system leads to a root Z and if Fi x (Z) = 0, then within com- 
putational accuracy the original system has a pair of coincident roots at Z . 
More significantly, perhaps, if the initial system is undetermined at or 
near a point Z 0 , the revised system will also be undetermined at this space 
point. Unfortunately, the converse is not true, as may be shown by apply- 
ing Milne's technique to the solution of the equation <t> — x + if — 0. 
The component equations are 

2 x = 0 (1) 

4s/ 3 = 0 (2) 

From this, 

1/11 = 24/ (3) 

and, if equation 3 replaces equation 2, we have | J 2 1 = 96 y and, finally, 

| J z | = 192. The completion of the chain demonstrates that the original 
system was well determined. If equation 1 rather than equation 2 had 
been replaced by | Ji 1 — 24 y = 0, the false conclusion might have been 
reached that the initial system was indeterminate. 

A numerical discrimination between indeterminacy and multiplicity of 
roots is sometimes available. If the degree of nullity of a singular Jaco- 
bian matrix at point Xo is only 1, the characteristic vector V associated 
with the zero root is easily calculated. Then to terms in the first order 
the equation J = 0 is satisfied by Xi(h) = Xo + AV. If by a test compu- 
tation with Xi(fc) as a starting point the root Xo is again found, X 0 is prob- 
ably a multiple root. If each trial Xi (h) leads to a different apparent root, 
the system is probably underdetermined or at least has many nearly coin- 
cident roots. Any computation of this sort is difficult because of rounding 


errors. 
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Aside from the difficulties caused by coincident roots and the general 
difficulties that may occur in any solution of linear simultaneous equations, 
a very important difficulty is that convergence in the large, that is, con- 
vergence to some solution from an arbitrary starting point, is not assured. 
For example, the system 

F 1 = z 2 - y 2 = 0 
F 2 = l + 2xy = 0 

will not converge by Newton’s method for any starting point x = y. This 
difficulty is analogous to the impossibility of finding a complex zero of an 
algebraic equation with the use of only real arithmetic. In this and sim- 
ilar-cases, an initial estimate nearly equal to a starting point that causes 
nonconvergence is expected to cause slow convergence. 

Even in relatively favorable cases, Newton’s method will often over- 
shoot during the early iterations. In some cases of initial overshoot, 4 a 
significant improvement in the rate of convergence has been achieved 
merely by limiting the magnitude of the vector change in the variables in 
each iteration. 

These experiences suggest strongly that, when Newton’s method is used, 
alternatives such as step limitation or inverse interpolation might be used 
when some measure of convergence, such as a norm of the residuals, is not 
reduced. If a truly nonconvergent sequence is encountered, a restart will 
be required. 

A second type of method of functional iteration involves only successive 
substitution in the application. Briefly, it is required that the equations 
be written in the form X = G(X) and that the function G(X) have a for- 
mal derivative G' less than unity. Specifically, if Xi and X 2 are any vec- 
tors in the neighborhood S of a point for which X = G(X) and 

|| G(X,) - G(Xi) || ^ K || X 2 - X x || K < 1 

the solution point X is unique. If, on the contrary, there were two solu- 
tions U x and U 2 in S, then clearly we would have 

|| G(Ui) - G(U 2 ) || = || JJ X - U 2 1| 

in violation of the hypothesis. Furthermore, if the hypothesis is satisfied, 
convergence is clearly established. 

Unfortunately, this method yields a domain of convergence that usually 
is small unless the function G is obtained by some elaborate process such 
as the formal expansion of Newton’s method; therefore, more often than 
not, divergence of an algorithm does not mean that the problem has no 
solution. On the other hand, convergence of an algorithm likewise does 
not of itself imply uniqueness, which must be established independently. 

This method has been very effective for applications in which it arises 
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naturally, such as the solution of integral equations and quasi-linear partial 
differential equations and the numerical approximation of the Picard iter- 
ation process as applied to systems of ordinary differential equations. 
Early efforts to apply this method to the problem of chemical equilibrium 
were abandoned in favor of the use of Newton’s method, because many 
different algorithms were required to achieve convergence in general situa- 
tions. This experience suggests that the method should be considered for 
problems of limited scope only. 


Methods of Descent 

The common feature of methods of descent is an explicit emphasis on 
the location of a minimum of a potential function or of some nonnegative 
(test) function of the residuals of a system of equations. The variants 
described in the literature range from the very formal “method of steepest 
descents” to the almost purely experimental “downhill” methods. 

The advantages of methods of descent as compared with methods of 
functional iteration are as follows: convergence in the large seems almost 
always to be assured and it is relatively easy, especially with the simplified 
methods, to prepare stored program computer coding for approximating 
the solution of an arbitrary problem and finding successive solutions when 
several are required. 

Starting from a geometric interpretation of a function <f> in n variables 
as a surface in a space of n + 1 dimensions, we designate the vector with 
components d<t>/dxi the gradient of <t> (grad <t>). If 4> is everywhere differ- 
entiable, grad <t> vanishes at the minima, maxima, and. saddle points of <t>. 
Elsewhere, the gradient is in the direction of the most rapid increase of 0. 
Further, if V is any unit vector and if we denote differentiation in the direc- 
tion of V by primes, then <f>' = V*grad <t> and * . 


= E 

ij 


ViVj 


d 2 <b 

dXidXj 


if the second derivatives exist. Provided that derivatives higher than the 
second are unimportant, the minimum of 4> in the direction of V from point 
Xk occurs approximately at Xk+i , where 

X k +i = X k - (V«'//) 

If V is the unit vector in the direction of grad </>, the iterative method is a 
method of steepest descent. The foregoing description is essentially that 
given by Householder 2 and. Booth. 1 

Because Newton’s method is an intermediate step, this descent method 
does not assure convergence and, when achieved, convergence may occur 
at a maximum or level point. Furthermore, the ultimate rate of conver- 
gence is first order. 

Although Booth 1 used this detailed method, lie noted that the effort in- 
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volved in computing second derivatives is excessive, and replaced the cal- 
culation of directional derivatives by finite difference approximations with 
the following result : 

Y _ Y + Vk 3(t>(Xk) “ + *(** + 2V*) 

* +1 * ^ 2 <t,(X k ) - 2 4 >{X k + V k ) + <t>(X k + 2V k ) 

where V* is a vector step size. Any use of this variant requires effective 
control of the- direction and magnitude of the vector step-size. Booth 
suggests the use of the step 

y = _1 4> grad <f> 

2 (grad <t>) 2 

which is effective if the process tends toward a zero of <f>, but may fail if 
too large a step-size is predicted when the process tends toward a level 
point that is not a zero of <j>. 

Regardless of the specific variant selected, certain rules of thumb may 
be used to avoid excursion toward a maximum. For example, negative 
values of or of 

4>(X k ) - 2$(Xk - V k ) + 4>{X k + 2V k ) 

indicate that the computation is tending toward a maximum or a saddle 
point, and a step in the opposite direction may be effective. Further, one 
should expect that <t>(Xi) will be smaller than any other computed value; 
if not, the location of the smallest value of <£ may be used for further cal- 
culation. 

The recent trend in descent methods stresses simplicity of coding for a 
stored program computer; in particular, the calculation of derivatives 
tends to be avoided. The extremes of this trend occur in the downhill 
methods of Ward 5 and of Lance, 6 in which the zeros or minima of test func- 
tions are sought by systematic sampling procedures. The methods as 
published have been applied to the location of zeros of analytic functions 
of a complex variable. 

By Ward’s method, values of the test function are computed at a start- 
ing point X 0 and at X Q ± he< . If the value of the test function is reduced, 
the locus of the smallest value is used at the next approximation Xi . When 
no reduction is found, the step size h is halved. The calculation continues 
until h falls below some assigned limit. 

Lance has used similar data to approximate the gradient of the test 
function <f> from the relations 

— h — — & <t>(X o) — 4>(X o + hei) 

OX{ 

~ 2 — hei) ~ <f>(Xo + he^)] 


or 
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which lead, respectively, to these approximations for Xj : 
x h = x io + <j>(X o) - <j>(X o + het) 


or 

x tl = x io + | [<P(X o - h£ { ) - <t>(X o + hei)) 

Of these approximations Lance prefers the first because fewer values of 
the test function need be computed. 

Both authors use the Gerschgorin vector norm, fa = 52 I F,- 1, as a test 

i 

function in preference to various possible quadratic forms. In applica- 
tions to analytic functions of a complex variable, the real and imaginary 
parts are treated as separate functions. Ward has proved that in this 
case the test function fa cannot have relative minima. 

In addition to the obvious fact that an initially chosen step size may be 
inappropriate and hence may require many adjustments, an important 
difficulty inherent in these methods is the development of stalemate situa- 
tions in which the algorithm fails to predict a correct downhill path for 
any step size. 

Lance found that Ward’s method sometimes degenerated into a stale- 
mate situation when the test function “contained a trough’ which was 
inclined to the axes.” 6 This particular difficulty was treated effectively 
by Lance’s variation. 

However, Lance’s method failed on the simple problem 

Fi(x } y) = x = 0 
F 2 (x,y) = y = 0 

with the test function <j> = \x \ + y 2 , upon reaching the point x = 0, 
y = — It is easily verified that from this point there is no step size 
leading to a reduction of <t> by the first of Lance’s methods. In general, 
then, it can be said that Lance’s approximate gradient method may tend 
toward a stalemate situation when the test function has troughs along the 
coordinate axes. 

The probability of success can be increased in many ways, such as by the 
use of an alternative method or of several methods simultaneously. One 
method that may be suggested is the conditional use of a variable test 
function. In particular, the test function fa = 53 I F» | fails to have a 
gradient at any point where one of its elements F t changes sign. If the 
derivatives of F; are sufficiently large, the minimum value and the subse- 
quent descent path will lie in the hypersurface F* = 0. 

This property of the test function suggests an algorithm that is known 
to be effective in simple cases: 
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If a trial step of a downhill method fails to reduce the value of the test 
function and if at least one residual has changed sign, ( 1 ) interpolate along 
the step for the zero of the residual F ,• that changed sign, and (2) proceed 
for one step with the test function <fcs = 22 i F* I- If the modified test 

function is reduced, return to the original problem. 

In conclusion of this discussion of descents and inverse interpolation 
methods, a few other properties may be noted : 

(1) The descent or downhill methods seem to assure, in general, con- 
vergence in the large, but the rate of convergence and accuracy may be 
poor at multiple roots. The development of stalemate situations may 
prevent ultimate convergence. 

(2) For a wide variety of problems, a general technique can be devised 
requiring almost no modification from problem to problem. 

(3) Simple and effective methods are available for finding successive 
roots. 

(4) The one important fault of these methods is that they will cause 
.convergence to a point of indeterminacy of a system of equations, and 
generally will do so without giving warning of any kind. This fault, how- 
ever, may not be important in physically oriented problems. 

A Mixed Method 

Recently, Davidon 7 has described methods for locating a minimum of 
a potential function or quadratic test function, involving the calculation 
of an approximate inverse of the Jacobian matrix (d 2 <t>/dxidxj) without 
requiring that the Jacobian matrix be computed explicitly. The methods 
may include linear constraints also. 

Starting with an estimate X 0 of the solution and H 0 of the inverse J~ l 
of the Jacobian matrix, Davidon computed the gradient V 0 of <t> at X 0 and 
a new test point X 0 , 

Xq = X 0 — HqVo 

and the gradient V 0 at Xo . If <t>(Xo) < <t>(Xo) or if there is a minimum of 
<t> between X 0 and X 0 , the approximate inverse is modified so that the 
smallest value of <t> would have been attained in one step : 

X, = Xu - H X V o 

The difference matrix Hi — H 0 is so chosen that step Xi — X 0 is multiplied 
by a proper scalar and that all. perpendicular steps would be unchanged. 
The process is repeated until the predicted change in position is less than 
some specified limit. 

The method of modification depends on the following results from matrix 
algebra. If H 0 is a symmetric positive definite matrix, V any column vec- 
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tor, V r its transpose, and b a scalar, then the matrix 


= Ho + 


(HoV)(H«V) T 

V T (H 0 V) 


has the property that H X V = (1 + 6)//oV and, if Y is any vector such 
that (// 0 V) r Y = 0, then H 2 Y = // 0 Y. Further, if A 0 is the determinant 
of /-/o and Aj the determinant of H x , then A x = (1 + 6)A 0 . Finally, if 
II o is singular and nullifies a vector Z, then Hi also nullifies the vector Z. 

To impose a linear constraint of the form a#* = a, the initial matrix 
//o is chosen so that H^cii = 0, and the point Xo is chosen to satisfy the 
constraint condition. 

Davidon’s paper is devoted chiefly to the details of the flow diagrams 
for the execution of the method on a stored program computer, and it in- 
cludes valuable details concerning interpolation for the minimum of the 
test function. The process outlined is terminated bv random perturbation 
of the estimated locus as a check on significance. 

David on also describes a second method for the progressive adjustment 
of a; trial matrix H 0 , a method equivalent to a conjugate gradient method 
if 4> is a quadratic form in the independent variables. Starting from a 
point Xo where the gradient is V 0 with an estimate H 0 of the inverse matrix, 
a matrix H x is sought such that the steps 

Xo = Xo - II oVu 

Xi = Xq — Z/iVo 

would be achieved by the single step 

X l = Xq - H X V o 

This can always be achieved by the adjustment 

Ih = Ho - b(Ho%)(HoV Q ) T 

where 

' b = l/Vc-//o(Vo - Vo) 

if H o is symmetrical and if b can be computed.^ 

In the case of general potential and test functions derived from arbitrary 
nonlinear systems, this change might be excessive. Davidon lists rules by 
which the change in. the determinant of H may be bounded. 

Owing to the recent appearance of this method in the literature, no basis 
exists for a practical appraisal of it other than Davidon ’s comment that 
it had worked effectively in all applications up to the time of publication, 
and that it was believed to be faster and more likely to succeed than com- 
puting methods for solution of the same problem. 
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Conclusions 

Three methods for solving systems of nonlinear equations seem to be 
reasonably effective: Newton's method, methods of descent, and Davidon's 
method. 

Although they require more preparatory effort, Newton's and Davidon’s 
methods are mathematically preferable to the descent methods because of 
their ultimate quadratic convergence and because the existence of multiple 
roots or indeterminacy can be detected easily. If computational proce- 
dures for the formal development of derivatives and the codes for their 
numerical evaluation were available, Newton's method would be preferred 
over Davidon's. 

Descent methods, especially the downhill variety, are advantageous in 
their ease of coding and the relative ease of finding successive solutions, 
although their ultimate rate of convergence is only first order. As a result, 
the accuracy of the results may be inferior. 

Each method reviewed may fail where an alternative method is effective. 
All the methods may converge slowly or occasionally may fail to converge, 
but difficulties with convergence can be removed by the judicious use of 
mixed methods. 
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